Automating the Generation of Semantic Annotation Schema Using a Clustering Technique

نویسندگان

Vitór Souza

Nicola Zeni

Nadzeya Kiyavitskaya

Periklis Andritsos

Luisa Mich

John Mylopoulos

J. Mylopoulos

چکیده

In order to generate semantic annotations for a collection of documents, one needs an annotation schema consisting of a semantic model (a.k.a. ontology) along with lists of linguistic indicators (keywords and patterns) for each concept in the ontology. The focus of this paper is the automatic generation of the linguistic indicators for a given semantic model and a corpus of documents. Our approach needs a small number of user-defined seeds and bootstraps itself by exploiting a novel clustering technique. The baseline for this work is the Cerno project [8] and the clustering algorithm LIMBO [2]. We also present results that compare the output of the clustering algorithm with linguistic indicators created manually for two case studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic-Based Image Retrial in the VQ Compressed Domain using Image Annotation Statistical Models

متن کامل

Towards a Surface Realization-Oriented Corpus Annotation

Until recently, deep stochastic surface realization has been hindered by the lack of semantically annotated corpora. This is about to change. Such corpora are increasingly available, e.g., in the context of CoNLL shared tasks. However, recent experiments with CoNLL 2009 corpora show that these popular resources, which serve well for other applications, may not do so for generation. The attempts...

متن کامل

An Improved Semantic Schema Matching Approach

Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...

متن کامل

Learning Semantic Parsers Using Statistical Syntactic Parsing Techniques

Most recent work on semantic analysis of natural language has focused on “shallow” semantics such as word-sense disambiguation and semantic role labeling. Our work addresses a more ambitious task we call semantic parsing where natural language sentences are mapped to complete formal meaning representations. We present our system SCISSOR based on a statistical parser that generates a semanticall...

متن کامل

Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important step in integrating the data sources. This article proposes a cluster analysis based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. The authors apply multiple clus...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Automating the Generation of Semantic Annotation Schema Using a Clustering Technique

نویسندگان

چکیده

منابع مشابه

Semantic-Based Image Retrial in the VQ Compressed Domain using Image Annotation Statistical Models

Towards a Surface Realization-Oriented Corpus Annotation

An Improved Semantic Schema Matching Approach

Learning Semantic Parsers Using Statistical Syntactic Parsing Techniques

Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources

عنوان ژورنال:

اشتراک گذاری